The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank

نویسندگان

  • Matthew Richardson
  • Pedro M. Domingos
چکیده

The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve PageRank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human-rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Intelligent Surfer Model Based on Combining Web Contents and Links

The PageRank algorithm is an iterative algorithm used in the Google search engine to improve the results of requests by taking into account the link structure of the web. More interesting and intelligent surfer model combining the link and content information in PageRank have been proposed in the literature. The main disadvantage of those models is that the combination of single word PageRank t...

متن کامل

The Generalized Web Surfer

Different models have been proposed for improving the results of Web search by taking into account the link structure of the Web. The PageRank algorithm models the behavior of a random surfer alternating between random jumps to new pages and following out links with equal probability. We propose to improve on PageRank by using an intelligent surfer that combines link structure and content to de...

متن کامل

Topic Continuity for Web Document Categorization and Ranking

PageRank is primarily based on link structure analysis. Recently, it has been shown that content information can be utilized to improve link analysis. We propose a novel algorithm that harnesses the information contained in the history of a surfer to determine his topic of interest when he is on a given page. As the history is unavailable until query time, we guess it probabilistically so that ...

متن کامل

Interview - ICT Projects in Serbia

Standard techniques for estimatingweb page relevance neglect the informationprovided by the visual layout of the page. Themost popular link-based technique, PageRank,uses a model of a random surfer to estimate theprobability that he visits a page at any giventime. This probability is assumed to beproportional to the relevance of the page.PageRank considers all ou...

متن کامل

A Study on Ranking Method in Retrieving Web Pages Based on Content and Link Analysis: Combination of Fourier Domain Scoring and PageRank Scoring

Ranking module is an important component of search process which sorts through relevant pages. Since collection of Web pages has additional information inherent in the hyperlink structure of the Web, it can be represented as link score and then combined with the usual information retrieval techniques of content score. In this paper we report our studies about ranking score of Web pages combined...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001